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ABSTRACT / 

A study was conducted to explore the reliability and 
validity, of three prominent procedures used in informal reading 
inventories (IRIs): (1) choosing a 95% word recognition accuracy 
standard for determining student, instructional level, (2) arbitrarily 
selecting a passage to representee difficulty level of a basal 
reader, 'and (3) employing one-level fl s and ceilings of 
performance to demarcate levels beyond .mi ch behavior is not sampled. 
Subjects were 91 elementary school students, representing a range of 
reading abilities. The students completed word recognition and 
passage comprehension tests and then individually read passages from 
each of the ten reading levels in the Ginn 720 and the nine levels of 
-the Scott-Foresman Unlimited reading series. Correlational and * 
congruency analyses of the resulting data supported the validi^ of r 
the 95% word recognition accuracy standard, but raised questions /< 
about the reliability and validity of the passage sampling procedures 
and the use of one-level floors and ceilings of performance. The 
findings suggest that IRI procedures for selecting passages from 
basal readers and for sampling students' performance at instructional 
levels may have a negative effect on educational practice. Sampling 
over time and test forms is a more valid IRI procedure. (FL) 
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• • Abstract 

Informal Reading Inventories (IRIs) are endorsed frequently 
by textbook, authors and teacher trainers. However, the reliability 
and validity of standard and salient JRI procedures rarely have bea,n 

« investigated. Employing 91 elementary age students, this study ex- 
amined the technical adequacy of (a)' choosing a criterion of 95% ac- 
curacy forword recognition to determine an instructional level, (b) 
selecting arbitrarily a passage to represent the difficulty level of 
a basal reader, and (c) employing one-level floors and ceilings to 
demarcate leve^ beyond which behavior isjoot sampled. Correlational 
and eongruency analyses supported the external validity of the 95% 
standard but questioned the reliability and validity of passage 
sampling procedures and one-level floors and ceilings. -.Sampling 
over occasions and' test forms is 4 iscussed as a more valid IRI 

* procedure. . 
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Reliability. and Validity of Curriculum-Based 
Informal Reading Inventories 

Certain norm-referenced tests possess strong technical adequacy. 
Their reliability, together with their capacity to compare the per- 
formance of an individual pupil to the performance of a group of simi- 
lar students, makes them both well suited as instruments for screening 
and, in some Instances, useful for placing pupils in special programs 
(Salvia & Ysseldyke, 1981). Host normative measures, however, do not 
have^dequate content validity; standardized test^items infrequently 
reflect the consent of curricula employe^ in classrooms (ArmLmster , 
Stevens, & Rosenshine* 1977; Eaton & Lovitt, 1972; Jenkins & Pany, 
1978). Thus, normative tests have limited utility for placing pupils 
in specific instructional programs. 

Many years ago, ' educators with an interest in reading instruction 
recognized the disparity between the content of standardized tests and 
the content of classroom curricula. Awareness of this incongruency 
fueled efforts, such as those by Wheat in 1923, to construct informal ' 
reading devices that would be more sensitive to classroom instruction 
and thereby j^culrf be more accurate in assessing students' strengths 
and weaknesses and their instructional levels (Beldin, 1970). 

Curriculum-based Informal Reading Inventories ( IRIjs) represent 
one such alternative to normative tests for assessing students' read- 
ing behavior. Wnile the extent tc which they are employed by classroom 
teach^f^S*»unclear, they are frer i»?ntly and strongly endorsee by text- 
book authors and teacher trainer* (e.g., Lowell, 1970). Kelly (1970) 



typified many academicians' admiration of IRIs when he wrote: "Reading 
^authorities agree that the informal reading inventory represents one 
of the most powerful instruments readily available to the plassroom 
teacher for assessing a pupil's instructional reading level " (p. 112). 

In spite of, or perhaps because of, this popularity, the soundness 
of procedures that typically govern the use of curriculum-based IRIs 
rarely has been investigated. This apparent lack of concern may be 
handicapping educators' efforts to determine accurately students' in- 
structional levels. Evidence for this is provided in occasional studies 
that investigated the reliability of IRI procedures. 
Procedures* for, Sampling IRI Passages 

One prominent feature of curriculum-based IRIs is the procedure of - 
selecting passages by drawing arbitrarily from texts (Beery, Barrett, 
& Powell, 1969; Bush & Huebner , ' 1970; *J<jhnson &\Kress, 1969). The ade- 
quacy of this sampling procedure rests on the assumption tlrat passages 
are likely to be representative of the texts from whiclt they were selected 
* i The c^ectness of this presumption has been questioned indirectly. 
Investigations have ^stabl ished that extreme variation exists in the 
readability erf basal readers. Not only is, there great divergence among 
bas^l readers of equal grade designations from different series 
(Pikulski, 1974), but also* there is dramatic variation in passages 
within the same text (Bradley & Ames, 1977; Fitzgerald, 1980). Such, 
variation suggests that the practiced representing a book's read- 
ability level with arbitrary drawn samples is inadequate, and that 
this practice may lead to* inappropriate instructional placements. 
Ceilings and Fioors oo Performance \ 

While the ferego/ftg concern questions the precision with which 
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passages rfepresent the difficulty of basal readers, .a second concern 
deals with the adequacy with which curriculum-based IRI procedures 
sample students 1 reading skills. 

Typically, the first level at which a student fails to meet a 

m 

criterioh'of mastery is* designated the pupil's "ceiling," and there 
is no further assessment O/f. reading behavior at levels of difficulty 
beyond this point. ^Simila^ty, reading behavior is not assessed below 
the'lev^T at which a pupil first reads proficiently. This level is 
designated the student's "floor." The belief that assessment is un- 
necessary below the one-level floor and above the one-level ceiling ^ 
rests upon at least two important assumptions. The first is that the 
difficulty of a series of basal passages progressed steadily so that 
levels above a ceiling and below a floor represent, respectively, ad- 
vanced selections and mastered material. This assumption, as discussed 
above, appears shaty. Second, given materials that are graduated 
accurately in difficulty, it is assumed that a consistent, inverse 

'relations-hip exists between the* quality of reading behavior and pas- 
sage difficulty, so that as the difficulty levels of successive pas- 
sages increase, the reading performance of a student necessarily 
worsens. Despite the importance of this second assumption to the 
use of ceilings and floors within IRIs, no pertinent empirical in- 
vestigations have been identified. 
Criteria for Instructional Levels of Performa 
V In addition to the questionable or unkno^rel iabili ty of prac- 
tices that direct the sampling of reading materials and the sampling 

, of reading behaviors, a third 'prominent feature of IRIs further obscures 
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the usefulness of the Informal reading assessment strategy. This 
third component is the criterion chosen to determine pupils' levels 
of reading instruction. 

There is no widespread consensus on standards to use for^ the 
identification of a pupil's. instructional level (Render, 1969). Tra- 
ditn al criteria in evaluating word accuracy and comprehension are 95% 
and 75*, respectively. The popularity of this convention, attributed 
to Betts (Beldin, 1970), is suggested by its use in inventories developed * 
•by Harris, Botel .♦Kress, and Johnson, and Austin and Huebner (Powell,, 
1971). However, departures from Betts 1 standards have' been numerous 
and, in some cases, dramatic. Smith (1959), for example, employed a 
criterion of .80% for word accuracy and 70% for comprehension. Cooper 
(1952) suggested 96% and 60% as criteria for word accuracy and compre- 
hension, respectively, in the primary grades, and 98% for word accuracy 
and 70% for comprehension in the intermediate grades. Spache (cited in 
Lowell, 1970) employed 60% and ^5% as satisfactory loweV limits of 
performance for word accuracy and comprehension, respectively. 

More important than the lack of agreementTon the usefulness of , . 
■ Betts' standards is th% indication that the 95% word recognition cri- 
terion may have weak internal validity. According to Powell (1971.), 
its po^&iWe incorrectness is indicated in two ways. First Killgallon's 
data, on which the Betts convention is based, appear insufficient in 
that (a) they represent the performance of^only 41 fourth 'grade stu- 
dents, and (b) the interpretation of subjects' scores was gratuitous. 

4 

Second, Powell demonstrated that first and second graders could 
tolerate an average word recognition 3core of only 85% and still 
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maintain 70% comprehension. Pupils in grades 3* through 6 could, 
achieve 70% comprehension with an average wojrd accuracy performance 
of 91% to 94%. Thus, regardless of grade level , the 95% word recog- 
nition criterion was not supported. This finding has recgived cor- 
roboration from Pi kul ski. (1974) ? 

In addition to the questionable internal validity of Betts* stan- 
dards v , persuasive evidence of their external validity is lacking (Kender, 
1969). Few studies have' attempted to validate the traditional criteria 
for word accuracy and comprehension against external standards, and 
available Investigations disagree in their findings. 

Three studies exemplify this last point. Oliver and Arnold (1978) 
"found that the Io^/a Test of Basic Skills (ITSB) correlated more ^s trongly 
than did the Goudy IR1 with' teacher judgments concerning the instruct 
•tional placements of students* Arnold and Arnold (1966) obtained similar 
results using a curriculum-based IRI, the Gates-MacGinitie Reading Tests, 
and the Wide Range Achievemf t Test. However, Botel (1968) found that 
the T J$otel Reading Inventory .hati higher correlations with pupils' actual 
instructional levels than did the California Reading Test, ITBS, and STEP 
% , Any conclusions that may bS drawn from these conflicting findings 
become even more t£htative in light of several methodological problems 
in the studies. All of the studies -used achievement test? of question- 
able psych ometri.c adequacy (cf. Ysseldyke, 1979). Also, the studies of 
Arnold and Arnold (1966) and Oliver and Arnold (1978) used (a) teacher 
judgments about the placement of pupils for instruction rather than the 
teachers 1 actual placements of students, and (b) small samples that pre- 
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eluded reliable correlations (Nunnally, 1959). Therefore, the in- 
structional performance standard traditionally employed in IRIs lacks 
both external and internal validity. * 

Tri summary, with their high content validity, many curriculum- 
based IRIs are strong^precisely in a way in which most norm-referenced 

* 

tests are weak. Alternately, however, salient IRI procedures have yet 
to demonstrate the high degree of reliability that characterizes some 
standardized instruments. This remains so despite tlie frequency with 
which IRIs tiave been advocated by 'textbook authors and teacher trainers. 
The purpose of the present study was to explore the reliability and 
validity of the three' prominent IRI procedures discussed above. This 
exploration was undertaken not to contribute to the elimination of IRIs 
but rather to clarify the legitimacy of their use or to strengthen the 
manner in which they are employed. Specifically, the stu8y (a) explored 
how many sample passages from basal textbooks were required before the 
readability levels of the passages represented the readability levels 
of the textbooks, (b) investigated the consistency of the relationship 
between pupils' reading performance and passage level difficulty to 
ascertain the adequacy of current practices that establish floors and 
ceilings of performance, and (c) examined an anray of word recognition ' 
criteria to determine which standards, if 'any, demonstrated acceptable, 
external validity with respect to achievement tests and teacher place- 
ments for instruction. 

Method 

Subjects 

Subjects were 91 students (51 boys and 40 girls) randomly -selected 
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from one public elementary Softool in a metropolitan school district 

in the Midwest. The numbers of suujects'in grades 1-6, respectively, 
were-14, 17-^ ,15, 18, 16, and 11. Fi ftee* subjects (16%) participated 
in a special education resource program, jind another 23 subjects (25%) 
were enrolled in a Title I program for students who had been desig- 
nated by their teachers as seriously behind in reading. 

Measur es 
m 



Achievement tests" . Two tests were selected from the Woodcock 
Reading Mastery Tests^ (WBMT)--Word Identification (ofcH)-and Passage Corn- 
prehension (PC). . The WI test requires that students read aloud isolated 
words^. There are 150 words ranging in difficulty from preprimfcr to 
beyopd 12th grade, level (Woodcock, 1973). The PC test contains 85 
items that employ a modified cloze procedure (Bormuth, 1969). Pupils 
are asked to Vead silently a passage from which a word has been deleted 
and "to produce verbally an appropriate missing word! The passages 
range in difficulty from first grade to college level (Woodcock, 1973). 

Teacher placements . The classroom ieacher of each student reported 
the book level in the Ginn 720 reading series from which the pupil 
rea'd for instructional purposes. - 

Basal readers . Two basal reading series were employed, Ginn 
720 (1976) and Scott-Foresman Unlimited (1976). They were chosen as . 
exemplars ^f popular and contrasting approaches to reading instruction. 
Ginn 720 emphasizes, a combination of phonetic, linguistic, and struc- . 
tural skills, whereas Scott-Foresman Unlimited places primary emphasis 
on comprehension and^study skills. r } 
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Procedure 

Before testing . Two 100-word passage? were selected as measures 
^yfrom each of 10 reading levels in Ginn 720 and 9 reading levels in 
Scott-Foresman Unlimited. To ensure that these passages were repre- 
sentative of the reading difficulty of the levels from which they 
( . were chosen, the following procedure, adapted from Fuchs and Balow 
f (1974), was employed. First, five pages were chosen at random from 
Q (a) the last 25% of the pages constituting each reading level, and 

(b) pag^s that were not dominated by phonics exercises, dialogue, in- 
dentations, and proper r.ouns. Second, on each of these five pages a 
100 word passage was identified. Next, for each passage a readability 
score was calculated^ The Spache Readability Formula (Spache, 1953) 
«-as applied to passages in books from preprimer through third grade 
. and the Dale-Chall Formula for^Predicting Readability (Dale & Chall, 
1948) was used for passages in books from fourth grade through sixth 
grade. Fourth, the average readability of the five passages at each » 
reading level was determined. Last, if, the readability scores of two 
passages were within one morUh of the mean readability sco^e of the 
five passages, then these two passages were selected as representative 
of that level. However, if two passages could not be identified, then 
a sixth passage was randomly chosen and steps two through five were • 
repeated. This procedure was repeated until two appropriate passages 
were found. 

Also preceding assessment, classroom teachers indicated the read- 
1r><* level to which each subject was assigned for clasVoom instruction. 
During testing . Subjects Individually were administered the * - 
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WI and PC tests and were asked to read passages from each of the 10. 
reading levels in the Ginn 720 and the 9 levels in the Scr tt-Foresman 
Unlimited series. This was acconplished in one 45 to 60 minute session 
in the subject's home school. Testing was conducted by trained research 
ar.J psychometric assistants. 

The reading passages 'from the basal readers were administered 
in random order. Preceding the presentation of the first passage, 
the examiner said, "I want you to read aloud to me as quickly as you 
can. If you don't know a word, skip it. Try your hardest and remember 
to read quickly. I'll tell you when to stop." The examiner then pre- 
sented a copy of the passage, directed the subject to begin, and activated 
■> a stopwatch. Subjects were permitted 60 seconds in which to read each 
passfage. The examiner scored each subject's performance by crossing 
out insertions, substitutions, mispronunciations, and omissions. For 
each passage, three scores were generated for the subject: the number 
and percentage of words read correctly and the number of words read in- 
correctly. For subjects who completed reading a passage in less chan 
the allotted l^ie, the time (in seconds) required by the subject was 
i ndicated. 

Following testing .* Seven criteria were used for judging instruc- 
tional levels in each of the two reading series. The criteria are 
defined below. For each criterion, an instructional level was assigned 
to each subject s y identifying the highest readino level at which the 
subject met the standards before unsatisfactory performance was 
demonstrated at two consecutive levels. 
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^-Criterion 1: for Pre-Primer (PP) through grade 3 books, 
30-49 words per minute (wpm) wit.i seven or 
fewer errors per minute (epm); for grade 4 
through grade 6 books, 50 or more (+) wpm 
with seven or fewer epm. 

Criterion 2: 70 + wpm with 10 or fewer epm. 

Criterion 3: 100 + wpm with 0-2 epm. 

Criterion 4: 95% accuracy. 

Criterion 5: 70 wpm with 95% accuracy . 

Criterion 6: for PP through grade 2 books, 50 + wpm with 
95% accuracy; for grade 3 through grade 6, 
70 + wpm with 95% accuracy. 

Criterion 7: for Pr through grade 2 books, 50 + wpm with 85% 
accuracy; for grade 3 through grade 6 books, 
70 + wpm with 95% accuracy. 

Criteria 1-3 were selected because they are employed frequently by 
precision teachers (Alperi Nowl in, Lemoine, Perine, & Bettencount, 1973; 
Haughton, 1973; Starlin, 1979; Starlin & Starlin, 1974). Criterion 4 
was chosen because' it is the traditional standard among users and advo- 
cates of IRIs for identifying pupils' instructional levels (Bjldin, 1970) 
Criteria 5 and 6 were devised for this study, and represent combinations 
of the rate and percentage-accuracy criteria found in the first three 
criteria. In Criterion 7, an 85% accuracy standard for students in 
books PP-2 was introduced. Its selection was based on Powell's (1971) 
demon tration that PP through grade 2 readers maintained 70% compre- 
hension while their word recognition accuracy was at 85% qr better. 

' Results 

^ — —■ . — 

Representativeness of Sample Passages 

Table 1 displays the reading levels from the Ginn 720 and Scott- 
Foresman Unlimited series and corresponding readability scores both as 
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reported by publishers and as derived from readability formulae. As 
shown in Table 1, means of the scores produced by readability formulae 
were calculated (a) across the total number of passages sampled at each 
reading level, and (b) on the two 100 vord passages at each reading level 

v 

that were used as measures .in the study. Additionally, Table 1 displays 
the number of passages sampled at each reading level before the readability 
scores of two passages coincided with the mean readability scores for the 
readirfg levels. The number of passages necessary to achieve adequate 
representation ranged from 5 tu 14. Of 19 textbooks in both reading 
series, 10 (53.00%) required the se^ction of 10 or more passages before 
two representative passages could he identified, 

Insert Table 1 about here 



Difficulty of Passages and Variability of Performance Across Reading 
Levels 

In creasing passage difficulty . Within the two basal series, the 
mean readability scores of adjacent levels were compared. Differences 
between pairs of scores, as well as the values of £he t tests, are 
presented in Table 2, These contrasts indicate that, for both basal 
series, the readability scores of the passages increased steadily at 
successively higher book levels. ^In Gifln 720, readability scores in- 
creased an average .44 grades; in Scott-Frresman Unlimited, /scores in- 
creased an average .43 grades. Seven of the nine contrasts for Ginn 
720 were statistically significant. In ,Sa)tt-Foresman Unlimited, only 
three of the eight comparisons were significant. This suggests greater 
re/iability for the differences between adjacent levels in the Ginn 720 
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series than in the Scott-Foresman Unlimited series. However, given 
nearly identical increases in readability scores in the two basal 
series (X=.44 grades for Ginn 720; X=. 43 s1 grades for Scott-Foresman Un- 
limited), this greater reliability seems to be due to reduced varia- 
bility in the readability of passages in Ginn 720 rather than to larger 
differences in the readability scores between selected 'passages . 



Insert Table 2 about here 



Variability of student performance . Two analyses were employed 
to determine whether performance decreased as the difficulty of sarnie 
passages increased. The first analysis examined the group's mean per- 
formance on increasingly more difficult passages. 

Figure 1 displays mean words correct per minute (wpm), mean errors 
per minute (epm), and mean percentage correct (pc) scores in both basal 
series. Trend lines (White, 1971) were computed on and drawn through 
the data ig, Figure 1. The trend lines revealed a negative slope for 
mean wpm scores (-5.33 in Ginn 720 and -2.56 in Scott-Foresman Unlimited) 
and for mean pc scores (-3.50 in Ginn 720 and - .88 in Scott-Foresman 
Unlimited). As expected, the mean performance scores generally decreased 
.as passage difficulty increased. However, this was not a consistent 
performance pattern. Of 17 pairs of adjacen^ passages that increased 
in difficulty, 13 pairs (76.00%) of mean wpm scores and only 11 pairs 
(65.00%) of mean pc scores decreased. This inconsistency in performance 
is more obvious with respect to the mean epm scores. While the trend 
line for Ginn 720, as anticipated, was positively sloped ( + .89), the 
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trend line for Scott- Fo re sman Unlimited was flat. Moreover, jpeng 
the 17 pairs of sample passages that increased in difficulty, only 
9 pairs (53.00%) of mean epm scores increased. 



Insert Figure 1 about here 
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Standard deviations of the mean scores plotted i n Figure ^ ranged"' • 
from 47.8 to 37.5 for wpm scores, 31 .6 to 39.0 for/ pc scores, and 9.9 
to 20.7 for epm scores. Given this variability, a congruency analysis «, 
was Undertaken to explore the regularityjyth whNch each subject's per- 
formance reflected sample passages' increasing difficulty. An index of 
the degree of variability of subjects' performance, calculated for each 
instructional criterion and»for both ^series, was defined as the percentage 
of subjects (a) r failing to meet the^in^tructional criterion at a level 
lower than the one where ahat criterion had been met successfully, and/or 
(b) meeting the instructional criterion at a level higher than one at 
which the criterion already had been failed. Averaged across the seven 
instructional criteria and the two basal ^eries, 55.00% of the subjects 
showed this inconsistency in performance. For the traditional IRI 
standard, 95% accuracy £f word recognition, 56.00% of the subjects 
deroonst rated this inconsistency 
Validity of Alternative Instructional Criteria 

Correlational and congruency analyses were employed to determine t 
the validity of the seven instructional criteria. 

. Correlational analysis . . Firs,t, a correlational matrix was con- 
structed that included each of the 14 instructional letfel scores 
(seven criteria x two basal series) and the raw scores on the two 
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achievement tests. Correlations ranged from +.57 to +.95, reflecting 
the extent to which subjects' scores at the instructional level pre- 
diet, or are valid, with respect to subjects* scores on the standard- 
ized achievement tests.. Of 28 correlations (14 instructional level 
scores x 2 achievement test scores). 23 were greater than +,80. 



^Averaged w-tth+u JH^tftKrtTorraT TrvterTa7 the mean correlations for 
Criterion 1 through Criterion ^ wer? +.93, +',88, +.62, +,.85, +.85, 
+ .86, and +.90, respectively. Correlations, then, for all of \the 
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criteria except for Criterion 3 were high and similar to each other, 

Congruency analyses . Two congruency analyses explored the extent 
of agreement between instructional level scores and three criterion 
measures. The criterion measures were (a) teachers^ actual l<jvel of 
placements of subjects in the Ginh 720 series, (b) subjects 1 performance 
on the WI test, ancf (t) subjects' performance on the PC test. The 
first of these analyses examined whether subjects' re3ding leve>s, 
defined each of. t+ie instructional criteria, were the same as, hjgher, 
or lower than subjects' reading levels denoted by each of the three 
criterion measures. Reading levels designated by instructional criteria 

were perceived as in agreement with teacher placements whea instructional 

\ \ 
level scores fell within a range of tv^o consecutive texts in the Ginn 

\ , 

720 series (-1 level <x<^ + 1 level), or^ within'an average of .88 grade 
levels. An instructional score was considered to be congruent with the 
two achievement tests tobe* the instructional score was within 1.0 grade 
levels. Correlated t tests applied to the differences between instruc- 
tional level scores and each of the three criterion measures constituted 
the second congruency analysis. 

\ 
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Table 3 displays the percentages of subjects placed high, low, 
and accurately with respect to teacher placements. Employing Cri- 
teria 4 through 7, the instructional scores 'placed similar percentages 
of subjects high, low, and accurately. Across the four performance % 
standards, an average of 6£.50% of the subjects were placed correctly, 
,7.00% were placed low, and 18.50% were placed high. Using Criterion 
2, the extent of agreement was proportionately similar; however, a 
smaller percentage was placed correctly (53.00%) and greater percent- 
ages of subjects were placed high (29.00%) and low (18.00%) . ^Tnstruc-\ 
tional- Criterion 3 placed low a relatively large percentage of subjects 
(58.00%) and Criterion 1 placed high a comparatively large percentage 
of subjects (50.00%). 



Insert Table 3 about here 



difference = 1 .87 leve 
(mean difference = .54 



Correlated t tests corroborated this pattern of congruency for 
the different instructional criteria. For Criteria 1 and 2, the differ- 
ence between thi instructional scores and 'the teacher placements was 
statistically significant, t(89) = 8.42, £ * .000 for Criterion 1 (mean 

s) and t(89) = 2.29, £ = .000 for Criterion 2 
l^fels). For Criterion 3 the difference also 
was statistically significant, t(^89) = 7.72, £ = .000. This time, 
however, the teacher placements were higher than the instructional 
scores (mean difference = 2.32 levels). For Criteria 4-7, there were 
no statistical significant differences. 

»The degree of c&ngruency between the instructional level scores 
in both basal series and the PC and WI tests also were examined. Bach 
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instructional level score was converted to its corresponding reada- 
bility grade score (see Table 1). The readability gr^de score for each 
instructional criterion then was compared to both the WI and PC grade 
equivalency scores for every student to determine tfhe percentage:: of 
students placed high, low, and accurately by each instructional criter- 
ion. Therefore, there were four combinations of congruency percentages 
and four series of correlated t tests: Ginn 720 series instructional 
grade-scores with PC and WI grade scores, and Scott-Foresman Unlimited 
instructional grade scores with PC and WI grade scores. • 

The average percentages across these four combinations are presented 
in Table 4. The extent of corigruency was similar for Criteria 4-7, wjth 
an average of 51.39% of students placed the same, -10. 18% placed high, 
and 38.43%^laced lowl Criterion 2 placed correct a similar percentage 
(51.50%)^th a more eveivdistribution between low (21.50%) and high 
(26.50%) placements'. Criterion 3 placed Vow a large percentage of 
students (60.25% placed low, 33.00& placed the same, and 1.00% placed 
high), while Criterion 1 placed. high a large percentage of students 
(43.25% placed high, 11. £S% placed low, 44.75% placed the same) 4 . 
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Again, correlated t tests corroborated this pattern of pongruency 
for different instructional .criteria. For Criteria 1 and 3, the dif- 
ference between the instructional grade scores and achievement test 
grade scores ^lways was statistically significant for Criterion 1, 
t(91) < 3.55, £ = .001 atfd for Criterion 3, t(91 ) < 5.33, £ = .000. 
Criterion 1 placed students hU^rti^^ average .55 levels and Criterion 
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3 placed students low by^n average 1.20 levels, with respect to 
standardized test performance/ The average difference was the smallest 
for Criterion 2 { .ll^levels) 

Discussion 

The purpose of this investigation was to explore the reliability 
and validity of theffal l^o ,ing prominent IRI procedures; (a) choosing 



4 95% Word recognition accuracy standard for determining instructional 
level; (b) arbitrarily selecting^ passage to represent the difficulty 

* level of a basal reader; and (c) employing one-level floors and ceilings, 

* Findings of this* study support the techjiical adequacy of one of these 
procedures., but question the adequacy of the remaining two. 

Results support the use of the traditional, IRI standard of 9S% 
for acci^racy of word recognition. This standard of instructional level, 
as well as several other criteria used in informal reading* assessment, 
exhibit validity with respect to standardized achi 3vement\tests . As \ 
evidence of thi s' validity, correlations between instructional level 
scores and achievement test raw scores were high and statistically 

significant, except when Criterion 3 was employed. Criterion 3 was ^ 

i 

the level at which a student read at 100 wpm with 0-? errors. This 
criterion T the most stringent, placed many students at low reading 
levels, failing to discriminate effectively among readers with differ- 
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ent skills/and resulting in lower correlations with achievement tests. 

.Two^congruency analyses supplemented the correlational examination 

of the Validity of IRI instructional performance standards. These 

analyses were: (a) the percentages of students placed, low, high, 

and the same with respect to criterion measures, and (b) correlated 
♦ 
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t tests on the difference between the i nstructional }evel scores 
and the scores generated by criterion measures. These congruency 



analyses revealed that, despite its high correlations with the stan- 
dardized tests, Criterij/n 1 yielded instructional level scores that 




placements, or the standardized tests. Criterion 3, which resulted 

« 

in the lowest correlations with standardized tests, also produced in- 
structional level scores that agreed^poorly with both criterion measures 

To determine the acceptability of an instructionaj criterion, the 
following arbitrary standard was adopted. It had to produce scores 

that resulted in (a) correlations with standardized achievement tests 

* • 

of at least +.80; (b) ^t least 50.00% congruency #i th teacher placements 
and standardized tests; and (c) an average difference of no more; than 
one-half level between instructional level scores and teacher place- 
ments and standardized tests. Given this -standard of acceptability, 
Criteria 2, 4, 6, and 7 appear acceptable. Criterion 2 is 70 + wpm 
with 10 or fewer errors (86% accuracy). Criterion 4 is 95% accuracy, 
the traditional IRI instructional criterion." Criteria 6 and 7 employ 
different oral reading rates for primary (50 wpm) and intermediate 
(70 wpm) readers as they employ 95% and 95/85% accuracy, respectively. 
Any one of these four criteria demonstrates strong concurrent validity 
(as reflected in the correlations with standardized achievement tests) 
as well as good agreement with criterion measures. Each appears to 
be a good choice for. use in an IRI. 

Therefore, the external validity of several performance standards, 
including the popuTar I_RI instructional performance standard, was > 




djd not agree well with either of the criterion measures, teacher 
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demonstrated in the present investigation. The strength of this 
conclusion, however, is tempered in light of two deviations from 
standard IRI procedure. First, in contrast to the typical one-level 
ceiling, a two-level ceiling was employed to determine instructional 
levels. t A second deviation, also relevant to the remaining disc-ssion, 
is that reading- performance was timed in this study and jstudents were 
-^stopped at the completion of 60 seconds, 

t h respect to the two other commonly employed IRI procedures, 
results of the present study question thp typical passage selection 
procedure as well as the use of one- level ceiJings and floors. First, 
for over one-half of the 19 books employed in the Investigation, ade- 
quate readability representation was not achieved until 10 or more 
passages were sampled. Therefore, the common practice of arbitrarily 
selecting passages ficom a book to represent the difficulty of the 
material in that text apptfafs inadequate, and may jeopardize the con- 
fidence with which educators can interpret IRI results. 

Second, despiti" the use of representative passages that, in fact, 
did increase in difficulty within each reading series, students 1 per- 
formances did rfot necessarily weaken as a function of this increasing 
difficulty. An average of only one-half to three-quarters of mean 
performance scores decreased on adjacent passages. Additionally, for 
an, average of over one-half of the subjects, (a) performance standards 
were met at levels higher tinn a level that the student already had 
failed, and/or (b) the standards were not met at levels lower than 
one at wh*ch the student Ird succeeded. These findings seriously 
question the assumption often held by advocates of IRI s that a student's 
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performance is consistently adequate below a one-level floor or that 
his/tier performance is consistently inadequate above a one-level ^ceil ing. 
To proceed on the basis of such an assumption may produce inaccurate 
estimates of^upil s 1 instructional levels. 

The findings of this study thus suggest that IRI procedures for 
selecting passages from basal texts and for sampling pupils' performance 
at instructional levels may have a negative effect on current educational 
practice. Alternate approaches to current procedures include: (a) 
identifying representative ^ssages with readability formulae instead 
of employing arbitrarily selected passages to represent a text's diffi- 
culty level, and (b) requiring students to read representative passages, 
from each level of a text rather than using a floor/ceiling approach. 
These alternate procedures may reduce error and may possess greater 
technical adequacy than current' practicqi howavpr, they may reduce dra- 
matically IRIs' appeal to practitioners. Curriculum-based IRIs seem to 
be popular as an informal assessment procedure because of the ease with w 

which they can be. created within any curriculum and then implemented. 

t 

Relatively elaborate procedures for creating and administering curric- 
ulum-based IRIs may make them infeasible for classroom use. 

We believe that another methodological optipn combines logistical 
feasibility with a capacity to sample both reading materials and pupils' 
competencies with greater validity. Epstein (1980) has suggested that 
sampling over occasions and over test forms is a widely ignored method 
for reducing measurement error and for increasing the likelihood cf 
replicable findings. Based on this premise, an ^lternate strategy 
consists^-of creating parallel forms of IRIs, administering them on 
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consecutive days, and then aggregating pupils' reading performances over 
days or continuing administrations until results agree on at- least two 
consecutive days. By testing over al ternate jforms, error stemming 
from nonrepresentati ve passages would be reduced because, each day new . 
passages would be employed; by assessing over occasions, error resulting 
from transitory student, examiner, situational, and procedural char- 
acteristics in testing also woul-d be diminished. Additionally, by 
^y^jnore stringently demanding agreement in results on at least two con- 
secutive days or by aggregating performance over days to determine 
results, this procedure might reduce error that stefos from the lack 
of consistency in the deterioration of student performance through a 
series of passages of increasing difficulty. For example, Lovitt /nd 



Hansen's (1976) data revealed that a student's performance d 
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consistently worsen as a function of increasingly moVe difficult [Passages 
on any one day^ i -¥et, when averaged over ffve days, the student's per- 
formance did progress more consistently through the passages. While 
these procedures may be more time consuming than current practices; 
thay still appear feasible and do not demand additional* teacher training 
as other procedures might require. - # 
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Table 1 

Level Numbers, Grade Levels, and Readability 

* 

Information on Passages from Two Reading Series 



:ajg€ 



Series 




X Readabil ity 




* 


X Read ity 


Level 


Grade 


Score Across 






Scores oi Two 


Number 


Levels 


Pass? 5° 




SD b 
ou 




Ginn 720 












3-4 


PP-P 


2.02 


8 


.098 




5 


1-1 


2.21 


5 


.117 


t.ZJ 


6 


2-1 


2.43 


6 


.196 


2.43 


7 


2-2 


3.1/ 


1 3 


.536 


J. 1 U 


'8 


3-1 


3.60 


10 


.468 


3.00 


9 


3-2 


4.11 


r 
h 


.142 


4.U3 


10 


4 


5.00 


11 


A "7 C 

.476 


b.UU 


11 


5 


5,38 


10 


.534 


5.36 


12 


6 


3 • O 1 


14 


392 


5.75 


13 


7 


6 00 


13 


.593 


c 6 ' 03 


Scott-Foresman 










2-3 


PP-P 


2,57 


9 


.439 


2.57 


4 


1 


2.73 


5 


.156 


2.77 


5-6 


2-1 


2.87 


10 


.282 


2.95 


7-8 


2-2 


3.29 . 


7 


.293 


3.30 


9-10 


3-1 


3.64 


9 


.754 


3.59 


11-12 


3-2 


4.02 


T3 


.520 


3.94 


.1 3-15 


4 


4.89 


5 


.252 


4.82 


16-18 


5 


5.64 


11 


.525 


5.70 


19-21 


6 


6.04 


13 


.144 


6.03 



d Number of passages required to achieve representativeness, 
b. 



Standard deviation across passages. 
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Table 2 

Differences in Readability Scores Between Each Consecutive 
Pair of Passages in the Ginn 720 and Scott- Foresman Series 



Publisher's 

Level Difference t p- 

Number in Mean Value Value 



Ginn 720 


3-4 


vs. 


5 


.19 


-2.30 


.050 




5 


vs. 


6 


.22 


-2.31 


.050 




6 


vs. 


7 


.74 


-3.49 


.003 




7 


vs. 


8 


.43 


-2.79 


.011 




8 


vs. 


9 


.51 


-3.17 


.009 




9 


vs. 


10 


.89 


v -5.78 


.000 




10 


vs. 


11 


.38 


-1.70 


.107 




11 


vs. 


12 


.43 


-2.17 


.045 




.12 


vs. 


13 


.19 


- .78 


.441 


Scott- 


2-3 


vs. 


4 


.16. 


-1.32 


.198 


Foresman 


4 


vs. 


5-6 


.14 


-1.25 


.235 




5-6 


vs. 


7-8 


.40 


-3.04 


.009 




7-8 


vs. 


9-10 


.35 


-1.22 


.248 




9-10 


vs. 


11-12 


.38 


, -1.29 


.219 




11-12 


vs. 


13-15 


.87 


-4.92 


.000 




13-15 


vs. 


16-18 


.75 


-3.98 


.001 




16-18 


vs. 


19-2,1 


.40 


-1.93 


,068 




r 
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Table 3 

Percentages of Students Placed Below, Above, and the Same as 
Teacher Placements by Each Instructional ^ri+erion (N=89) a 



Placement by Cunficulum-based Measures 
Compared to Teacher Placement 



Criterion 


Below * 


Same 


Above 


7 


15 


69 


" 16 


6 


19 


65 


14 


5 


23 


63 


15 


4 


21 


61 


18 


3 


58 


39 


3 


2 • 


18 


53 


29 


1 


3 


47 


50 



a No olacement was reported for two students 
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Table 4 

percentages of Students Placed Below, Above, 
and the Same as Achievement* Test Scores by 
fach Instructional Criterion (N=91) a 



Curriculum-based Grade Scores Compared to 
Achievement Te^t Grade Scores 



Cri terion 


Below 


Same 


Above 


7 


32.50 • 


58.00 


8.75 


6 


40.00 


51.75 


f.50 


5. 


42.50 


49.00 


7.7: 


4 


39.25 


46.50 


13.50 


3 


61 .00 


38.00 


1.00 


2 


26.50 


51.50 


21.50 


1 


11.25 


44.75 


43.25 



Percentages are across reading series and across achievement tests 



(WI and PC). 



* 
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Figure 1. Number of words correct and errors per minute, and percen- 
tage correct in levels 1-10 of Ginn 720, and leve]^ 1-9 
of Scott-Foresman. Multiply units by 20. . 
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